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(54) A sytem for interactive communication 

(57) A system for providing a primarily audio envi- 
ronment for world wide web access includes a system 
for rendering structured documents using audio, an in- 
terface for information exchange to users, a non-key- 
word based WWW search system and a few miscella- 
neous features. The system for rendering structured 



documents using audio includes a pre-rendering system 
which converts a HTML document into an intermediate 
document and a rendering system which actually gen- 
erates an audio output. The interface includes a non- 
visual browsing system and an interface to users for vis- 
ual browsing environments. 
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Description 

The present invention relates to a system and meth- 
od for interactive communication, and more particularly 
to a primarily audio environment for providing access to 
information held, for example, in network such as : for 
example, the World Wide Web. 

The World Wide Web (WWW) is rapidly becoming 
the single most important source of information for busi- 
nesses and consumers. With any source of information, 
a trade-off is involved between the value of information 
discovered and the opportunity cost of the time spent 
discovering it. Recent advances in technology, like the 
cellular phone, have helped people to improve the result 
of this trade-off, allowing them to better utilise time which 
would otherwise be unproductive, such as the time 
spent commuting to work, or exercising. The WWW 
however, is difficult to use in such situations, because 
existing WWW browsers require significant visual atten- 
tion and a high degree of interactivity. 

The present invention seeks to provide a system 
which can access the World Wide Web in a situation 
where a computer monitor and/or keyboard are not 
readily available. Audio playback is substituted for text 
and visual feedback, and verbal commands or physical 
manipulation of buttons and dials is substituted for vir- 
tual manipulation of GUI elements. Other systems, like 
Web-On-Call by Netphonic, allow similar functionality, 
but require extensive preparation at the server side. This 
means that users can only access the small number of 
servers which are customers of this service. The system 
of the present invention is capable of doing all of its work 
at the client side, so users can access any web server. 

Recently, a company known as The Productivity 
Works released a WWW browser called WebSpeak. 
This browser is intended for the visually impaired. Al- 
though this system includes large type and other visual 
features, it also uses text-to-speech synthesis to 
present HTML documents. The internal workings of this 
system are not known and no further information is avail- 
able to the applicants at this time. 

The present invention seeks to solve the above 
problems by providing a non-visual browsing environ- 
ment for the WWW. Such a system is also extremely 
useful for visually-impaired web surfers. 

In accordance with the invention, a system for inter- 
active communication between a user and an informa- 
tion source having the features of claim 1 is provided 
Furthermore, a method for providing interactive commu- 
nication between a user and an information source in 
accordance with the invention has the features of claim 
10. 

An embodiment of the invention comprises a sys- 
tem for rendering structured documents using audio, an 
interface for information exchange t'o users, a non-key- 
board based WWW search system and a few miscella- 
neous features. The system for rendering structured 
documents using audio includes a pre-rendering system 



which converts a HTML document into an intermediate 
document and an audio rendering system which gener- 
ates an audio output. The interface includes a non-vis- 
ual browsing system and an interface to users for visual 
s browsing environments. 

For a better understanding of the present invention, 
and to show how it may be brought into effect, reference 
will now be made, by way of example, to the accompa- 
nying drawings, in which: 
10 Figure 1 illustrates a block diagram of one embod- 
iment of the present invention. 

Figure 2 illustrates the pre-rendering analysis of an 
HTML document which takes place in the rendering 
structured documents feature. 
75 Figure 3 illustrates the process of computing the 
speech markup information for the intermediate docu- 
ment of the pre-rendering analysis. 

Figure 4 illustrates the process of rendering the in- 
termediate document and generating speech and other 
20 audio. 

Figure 5 illustrates a sample interface panel for the 
WIRE system of the present invention. 

The present invention will now be described with 
reference to a particular embodiment of the invention 
25 called the WIRE system. 

The WIRE system of the present invention is a set 
of software technologies and a user interface paradigm 
which can be used to access the World Wide Web in a 
situation where a computer monitor and/or keyboard are 
30 not readily available. As stated above, the idea is to sub- 
stitute audio playback for text and visual feedback, and 
to substitute physical manipulation of buttons and dials 
for virtual manipulation of GUI elements. The WIRE sys r 
tem of the present invention does all of its work at the 
35 client side, so users can use it to access any web server. 
In order to achieve this object, several new technologies 
and methodologies were developed which are de- 
scribed below. 

Figure 1 illustrates one embodiment of the WIRE 
40 system 10 of the present invention. The key compo- 
nents of the WIRE system 1 0 are: a system for rendering 
structured documents using audio 11, an interface 12 
for information exchange to' users, a non-keyword 
based WWW search system 13 and a few miscellane- 
45 ous features 14. The system for rendering structured 
documents using audio 1 1 includes a pre-rendering sys- 
tem 15 which converts a HTML document into an inter- 
mediate document and an audio rendering system 16 
which generates an audio output based on the interme- 
50 diate document. Interface 12 includes a non-visual 
browsing system 17 and an interface to users for visual 
browsing environments 18. 

The first component is the system for rendering 
structured documents using audio feature 11 Existing 
55 WWW browsers render HTML documents visually on a 
computer monitor. As an alternative, the WIRE system 
of the present invention provides an audio rendering of 
HTML documents, which is a way of presenting WWW 



15 



20 



25 



30 



35 



40 



2 



EP 0 848 373 A2 



content, including text structure, and representation; 
using only audio playback. The architecture of the WIRE 
document rendering module is applicable to any kind of 
structured document but for explanation purposes, the 
implementation described below is for HTML docu- 
ments since they are the most prevalent type of struc- 
tured document on the WWW. 

The process used in WIRE for rendering an HTML 
document consists of two parts: the pre-rendering sys- 
tem 15 which converts the HTML document into an in- 
termediate document, and the audio rendering system 
16 : which generates the audio output based on the in- 
termediate document. Some parts of the two processes 
may occur concurrently, but they are logically separate. 

The pre-rendering analysis is shown in Figure 2. 
The first stage of analysis is to divide the HTML docu- 
ment into logical segments as shown by a Document is 
Segmented Stage 21. This is accomplished by looking 
for syntactic elements in the HTML document which 
generally indicate the boundaries between areas of dif- 
ferent content. The syntactic elements which are used 
are horizontal rules, and tags marking the beginning of 
tables, rows and columns. These elements are then 
considered segment boundaries. 

The output of the Document is Segmented Stage 
21 is sent to a Segments are Categorized Stage 22. This 
second stage of analysis categorizes the segments of 
the HTML document as either navigation segments or 
content segments. The categorization is accomplished 
by calculating the link density of each segment. Link 
density is a measure of the amount of content within a 
given segment contained within anchors for hyperlinks. 
In the WIRE system of the present invention, an empir- 
ical formula to calculate the link density, D. is used. 

n C HREF +[K* L f ) 



Where C HREF \s the number of non-tag characters in the 
segment which appear inside of HREF tags, C is the 
total number of non-tag characters in the segment, and 
L t is the number of hyperlinks within image maps in the 
segment. C is always assigned a value of at least 1, 
even if there are no non-tag characters in the segment. 
The value K represents the weight given to image map 
links and empirically is determined to have a value of 5. 
If a segment has a link density D > 0.7, it is categorized 
as a navigation segment, otherwise it is categorized as 
a content segment. The value 0.7 was empirically de- 
termined to be appropriate. 

The output of the Segments are Categorized Stage 
22 is fed to a Document Sectioning Stage 24. This third 
stage of analysis is to determine the section structure of 
the HTML document. The section structure differs from 
the segmentation, and the eventual use of each will be 
discussed shortly. 

Only content segments are analyzed for section in- 



formation. Each content segment is considered a top- 
level section. Within content sections any occurrence of 
a header tag or a fontsize tag is noted. In HTML, header 
tags are valued from 1 to 6 in decreasing order of prom- 

5 inence, while fontsize can range from 1 to 7 in increasing 
order of size. In the sectioning process, fontsize tags 
larger than the default text size are treated as header 
tags with prominence equal to 8 minus their size value. 
Relative fontsize tags, such as <fontsize +2>, are first 

10 converted to absolute sizes by applying them to the de- 
fault text size value. Fontsize tags defining sizes smaller 
than the default size are ignored. 

The result is a hierarchy of header tags of varying 
prominence. Sections are then defined hierarchically, 

is using the header tags as boundaries, with the top-level 
sections forming the top of the hierarchy, and the header 
tags denoting the subsections, sub-subsections, and as 
many further gradations as are necessary to account for 
the number of prominence values present in the docu- 

20 ment. 

The output of the Document Sectioning Stage 24 is 
fed to a Computation of Speech Mark-Up Information 
Stage 26. This final step of the analysis is to create an 
intermediate document which can be interpreted by a 

25 text-to-speech engine. Fundamentally, this step produc- 
es the meta-information, in the form of commands, 
which will cause the text-to-speech engines to vary its 
voice, tone, rate and other parameters to adequately 
convey the information within the HTML document. In 

30 this example, commands are given for a Microsoft SAPI 
compliant text-to-speech engine. Computation of 
Speech Mark-up Information Stage 26 also determines 
which text will be presented to the user. Computation of 
Speech Mark-up Information 26 is fully described in Fig- 

35 ure 3. Finally, additional meta-information is provided to 
denote segment boundaries for use during playback. 

One type of meta -command which the procedure of 
the present invention produces is a command to switch 
from one voice to another, depending on the type of text 

•to being spoken. In this example, four voices are used for 
the rendering. Voice one and two are used for normal 
text passages and one of them is always considered to 
be the active voice. The variable V stores the current 
active voice. Voice three is used for section headings 

■J5 and titles, and voice four is used for hyperlink anchor 
text. The exact parameters of each voice are not critical, 
and different users may choose different voices. 

As described in Figure 3, the process of generating 
the intermediate document consists of parsing the orig- 

50 inal structured document, along with the segmentation 
and sectioning information determined in the previous 
phases, and producing output accordingly. Normally, 
non-tag text is simply echoed to the intermediate docu- 
ment while tag information is discarded, but additional 

55 output is generated in the following cases: at the begin- 
ning of each segment, a marker is written indicating the 
new segment and its type. At the beginning of each sec- 
tion, subsection, or finer granularity section, text is writ- 
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ten to the intermediate document stating "Section X'\ 
where X is the section number. For example, in the sec- 
ond subsection, of the third subsection, of the first con- 
tent segment (top level section), the text "Section 1 .3.2 tt 
would be written. The voice is switched to voice three to 
read this section header and then switched back to the 
active voice, V. This is accomplished by writing voice 
commands into the intermediate file before and after the 
new text. When a title or header tag is found in the 
HTML, the voice is switched to voice three, the text in- 
side the tag is written, and the voice is switched back to 
the active voice. When a strong or b (bold) tag is en- 
countered in the HTML, or when the text color is 
changed to a non-default color inside a section primarily 
using the default color, an /EMP tag is written to the out- 
put document before each word of the text within that 
tag. If an HREF tag is encountered : a link marker is 
placed in the document, the voice is switched to voice 
four, the text inside the tag is written, and the voice is 
switched back to the active voice, V. If a third consecu- 
tive p (paragraph) tag is found without the voice having 
been varied, the voice is switched to voice number three 
minus the active voice number. This effectively toggles 
the active voice between one and two. The Vflag is used 
to store the active voice number, and this is also updated 
to three minus the active voice number. The purpose of 
this change in the active voice is to break the monotony 
of having the same synthesized voice play continuously 
for a great length of time. 

During the rendering process, shown in Figure 4 : 
the intermediate document is parsed. In general, all of 
the text and meta-information is passed directly to the 
text-to- speech engine, which then produces an audio 
output. The exception is that any given segment may be 
skipped depending on the browsing mode of the user. 
Browsing mode is discussed more completely below. In 
addition, whenever a link mark is encountered, an alert 
noise may be played to alert the user of the presence of 
a hyperlink. 

The following will describe a non-visual browsing 
system. Existing WWW browsers use GUI elements and 
visual feedback to control navigation through web- 
space. To replace these in a non-visual environment, the 
WIRE system uses the several techniques described 
below. Figure 5 shows a sample interface panel 50 as 
a reference for the discussion. The interface panel 50 
of this embodiment is styled after the controls of a car 
radio, to demonstrate how the WIRE system interface 
can be integrated into such a system. However the exact 
form of the interface panel is not an essential element 
of the invention. 

The basic method of navigation in a WIRE environ- 
ment is to listen to the audio feedback of the browser 
and press the follow button 51 to follow the most recently 
played hyperlink. Hyperlinks are identified to the user 
audibly in two ways. First, a notification sound effect is 
played just before the hyperlink anchor is played. Sec- 
ond, the hyperlink anchor itself is spoken using a differ- 



ent voice from the rest of the text. The most recently 
played hyperlink is called the active link. The volume of 
the audio signal would be controlled by an audio volume 
dial 52 as shown in Figure 5. 

5 As listening to an entire web-page is not always 

practical, the WIRE system of the present invention 
could offer four modes of browsing. Each mode allows 
the user a specific kind of rendering of a page based on 
the information in the intermediate document. In normal 

10 mode, all segments are played in entirety. In navigation 
mode, only navigation segments are played. In content 
mode, only content segments are played, but navigation 
segments are announced. The announcement of navi- 
gation segments is given as "Navigation section, with N 

?5 links", where N is the number of hyperlinks in that seg- 
ment. In header mode, only headers are played. Head- 
ers can either be from header tags, or from fontsize tags, 
as described above. All modes begin a document by 
speaking the title of the document. 

20 Users can "scroll" through a document using the 
scan-forward 53, scan-reverse 54 and pause 55 con- 
trols. These controls function somewhat differently de- 
pending on the current browser mode. In normal mode 
and content mode, the scan-forward control causes the 

25 playback to skip to the next line in the intermediate doc- 
ument. The scan-reverse control causes the playback 
to skip to the previous line of the intermediate document. 
In navigation mode, scan-forward control causes the 
playback to skip to the next hyperlink anchor and the 

30 scan-reverse control causes the playback to skip to the 
previous hyperlink anchor. In header mode the scan-for- 
ward control causes the playback to skip to the next 
header and the scan-reverse control causes the play- 
back to skip to the previous header. 

35 The WIRE system maintains a traditional WWW 
browser history list. Users may move back and forth in 
the history list, one page at a time, using the history list 
control 58. The history list control can be implemented 
as a dial to allow quicker access, in which case an au- 

40 dible signal, such as a click, is used each time a page 
is reached. By listening to the number of clicks, the user 
can gauge his progress through the list. 

As described above, the rendering of each docu- 
ment begins with the speaking of the document's title, 

45 so the user can quickly discover his location. 

The WIRE system provides for immediate access 
to a number of user selected WWW documents, through 
the use of the favorite control(s) 56. This control is anal- 
ogous to the bookmarks of a traditional hypertext brows- 

50 er or the preset station buttons on a radio, in that it sym- 
bolizes a persistent address which the WIRE system will 
jump to immediately. The address in this case is a WWW 
URL. The favorite control also allows the active page 
(the one being rendered, or most recently rendered) to 

55 be marked as a favorite. A discussion of how the favorite 
button can be modified off-line is included below. 

The following will describe an interface to visual 
browsers. The WIRE system is not intended as a user's 
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primary WWW browser but rather as a browser which 
he may use in environments where a visual display is 
not available or not practical. As a result, the user may 
employ a standard visually oriented browser at other 
times, and may wish to pass information between the 
WIRE compliant browser and the visually oriented 
browser. The WIRE system supports this kind of infor- 
mation of transfer in two ways. 

The following will describe off-line designation of fa- 
vorites. The favorites control described above may be 
reset off-line using a visually based browser or web-au- 
thoring tool. One implementation of this scheme is to 
store the favorites as a structured document, such as 
an HTML document: on a web-server. The WIRE device 
may download this document at start-up time, and make 
any changes by posting them to the web-server. Simi- 
larly, the user could access and modify the document 
using any other browsing or authoring system when not 
in a WIRE environment. 

The following will describe document flagging. The 
WIRE system provides an additional flag page control 
57 for flagging documents. Flagging a document de- 
notes it as being of interest outside of the WIRE envi- 
ronment, and brings it to the attention of non-WIRE 
browsers. When using another browser, the user has 
quick (bookmark-like) access to any such pages. This 
scheme may be implemented by using a structured doc- 
ument to store the URLs of each of the flagged pages 
as a set of hyperlinks, and storing this document at a 
web-server. The WIRE compliant browser automatically 
updates the document by adding a hyperlink to the URL 
of any flagged page either through HTTP post or FTP. 
Any other WWW browser may access this page (or per- 
haps bookmark it), thus acquiring a set of hyperlinks to 
the flagged pages, and in turn gaining quick access to 
the flagged pages themselves. 

The following will describe a non-keyboard web 
search technique. Typical WWW browsing involves the 
use of search engines to find documents on a given sub- 
ject. In most traditional WWW environments, search en- 
gines are keyword driven, that is, queries are performed 
using keywords entered by the user with a text input de- 
vice. The WIRE system offers two alternatives to such 
a system, which are more practical in an environment 
which lacks a monitor and keyboard. 

WIRE allows users to search for WWW documents 
through a progressive refinement style search. In this 
system the user is prompted with a number of broad cat- 
egories from which to choose, lor example: Business 
and Commerce, Science, Literature and the Arts. etc. 
The user chooses one by using the follow control as de- 
scribed above, and may scan the list using the scan- 
forward and scan-reverse controls as described above. 
After a category is selected, the user is prompted with 
a list of more specific categories from which to choose, 
and then yet another list of still more refined categories 
The process continues until eventually some (or all) of 
the list elements are refined enough to represent spe- 



cific documents rather than categories. If the user se- 
lects one of these elements, the WIRE system loads and 
renders the associated document. Browsing then con- 
tinues as normal. 

5 Other features of this system include: a control to 

move back to the previous list, automatic looping at the 
end of a list, and a control that creates a category list 
from all of the categories from which the current cate- 
gory can be derived. There is also a control to move to 

w the topmost category list. 

This system is similar to the visual-based progres- 
sive refinement system used by the Yahoo! company's 
search engine. 

The following will describe rooted DAG browsing 

is without visual feedback. The category hierarchy in the 
progressive refinement technique described above can 
be represented as a rooted directed acyclic graph (RD- 
AG). The browsing scheme can thus be generalized as 
a method for browsing RDAGs in a non-visual environ- 

20 ment. The WIRE scheme of the present invention offers 
the following features for RDAG browsing: 

1 . Automatic, looping traversal across sibling nodes 

until an operation is selected. 
25 2. An operation to move back to the parent node 

from which you arrived, and the list of its siblings. 

3. An operation to move to the parent node from 

which you arrived, and the list of all other parents 

of the node being departed. 
30 4. An operation to move to the child list of a node. 

5. An operation to move directly to the root. 

The present invention has the benefit that the user 
always has some context of their current position, since 

35 they can always listen to the rest of the nodes in the list 
being presented. It also has the advantage over purely 
tree based systems that it allows the same node to have 
several parents. This allows more sophisticated repre- 
sentation hierarchies than are possible in a simple tree. 

40 Operation three therefore offers a powerful browsing op- 
tion not available in tree-based systems. 

Note that if operation two is performed in certain 
combinations after operation three, the result is not de- 
fined formally above. In this case, operation two should 

45 be treated as operation three. 

The following will describe speech recognition key- 
word entry. WIRE offers an alternative searching mech- 
anism to the system described above. If quality speech- 
to-text (speech recognition) software is available, the 

50 user may employ traditional keyword searches by 
speaking the keyword. Whether the word can be spoken 
explicitly, or spelled out letter-by-letter instead, is a func- 
tion of the quality of the speech-to-text mechanism. 
The WIRE system of the present invention has sev- 

$5 eral other miscellaneous features such as E-mail sup- 
port. In addition to acting as a WWW browser, the WIRE 
system can also serve as an Internet mail reader. The 
system can read e-mail messages to the user by using 
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text-to-speech translation. The rendering of each mes- 
sage begins with the From and Subject lines from the 
mail header using a different voice. The user may scan 
the message queue using the history control described 
above. 

WWW URLs which are embedded in an e-mail mes- 
sage are recognized and rendered as hyperlinks. They 
may be followed using the follow control described 
above. If such an action is performed, WIRE'S WWW 
browser functions are automatically invoked and re- 
place the e-mail reader functions. 

The following will describe support for audio files 
and streaming. The WIRE system is able to render dig- 
ital audio data stored in the form of WWW files. If a URL 
represents an audio file, either structured data (such as 
MIDI) or unstructured data (such as WAV), the file is ren- 
dered in the customary way. During the playback, the 
scan-forward, scan -reverse and pause controls function 
in the traditional way. 

The WIRE system also supports streaming of audio 
data using the common streaming protocols, such as 
RealAudio. 

The following will describe radio and commercials 
during document download. During the time spent 
downloading a new WWW document, the WIRE system 
may automatically switch playback to a live commercial 
radio feed. The particular radio station featured may be 
chosen by the user in some cases, or may be specified 
by the ISP in other cases. 

One way in which this feature can be used is in con- 
junction with an all-advertisement radio station. By gain- 
ing wide exposure to WIRE users, advertisers may be 
encouraged to subsidize the WIRE connection fees, al- 
lowing wider audiences to use WIRE. 

Alternatively, the WIRE system could deliver prere- 
corded commercials during downloading periods, re- 
freshed occasionally from the WIRE service provider. 
Since the WIRE service provider would have access to 
a profile of which web sites the user had visited, these 
commercials might be targeted specifically to the indi- 
vidual user. Users may agree to this sort of policy if the 
advertisements significantly reduced the cost of the 
WIRE service. 

The WIRE system of the present invention is a col- 
lection of technologies which allow access to the WWW 
in the absence of a visual display. Although WIRE is not 
intended as a user's primary WWW browser. WIRE 
technology may be useful to those whose time is pre- 
cious and who will benefit from WWW access from an 
automobile or a hand-held device, or any other non- 
standard setting. WIRE also brings WWW access to the 
visually-impaired. 

The WIRE system collects several existing technol- 
ogies including text-to-speech and internet protocols 
and enhances them with new technologies and meth- 
odologies, notably: an audio rendering scheme for 
structured documents, a non-visual WWW browsing 
scheme, a non-keyword WWW search scheme and an 



interface to traditional browsers. The combination of 
these new and old technologies creates a tool which will 
help to increase the efficiency of commuters, travelers, 
and exercisers everywhere, and bring the World Wide 
5 Web to the visually-impaired for perhaps the first time. 

Using a WIRE compliant device, a user could con- 
nect to the WWW from his car or from a hand-held sys- 
tem. The user would then be able to browse pages, con- 
duct searches, and listen to information from the web 
io being played audibly. Pages of interest could also be 
"flagged" for later investigation from a visual browser, 
and sites found using a visual browser could be flagged 
for later playback by the WIRE compliant device. Using 
this system, a user is freed from the visual attentiveness 
required by traditional browsers and is able to better uti- 
lize his otherwise unproductive time by browsing the 
WWW. 

The user interface has been described herein with 
reference to the sample interface panel 50. as shown in 
Figure 5. However, it would also be possible to provide 
browsing functions, for example "follow" or "scan for- 
ward", by suitable speech recognition and command 
software so that verbal commands can be interpreted 
and acted upon by the system of the present invention. 
This embodiment of the invention is of particular value 
in situations where physical manipulation of buttons on 
the sample interface panel 50 would be difficult. 

It is not intended that this invention be limited to the 
hardware or software arrangement or operational pro- 
cedures shown disclosed. This invention includes all of 
the alterations and variations thereto as encompassed 
within the scope of the claims as follows. 



35 Claims 

1. A system (10) for interactive communication be- 
tween a user and an information source character- 
ised in that the system (10) comprises means (11) 
40 for rendering structured documents obtained from 
the information source using audio, and a non-key- 
board based search system (50) controllable by the 
user. 

45 2. A system (10) as claimed in claim 1 , wherein said 
means (11) for rendering structured documents us- 
ing audio comprises: 

pre-rendering system (15) which converts a 
50 structured document into an intermediate doc- 

ument: and, 

an audio rendering system (16) which gener- 
ates an audio output from the intermediate doc- 
ument. 

55 

3. A system (10) as claimed in claim 2, wherein the 
pre-rendering system (15) and the audio rendering 
system (16) operate concurrently. 
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4. A system (10) as claimed in claim 2 or 3, wherein 
said pre-rendering system (15) comprises: 

a segmenting document system (21) for divid- 
ing said structured document into logical seg- 
ments: 

a categorizing document system (22) for cate- 
gorizing said logical segments as either navi- 
gation segments or content segments; 
a document sectioning system (24) for deter- 
mining section structure of said structured doc- 
ument; and. 

a speech mark-up information system (26) for 
creating said intermediate document which can 
be interpreted by a text-to-speech conversion 
means. 

5. A system (10) as claimed in claim 4, wherein said 
categorizing document system (22) comprises: 

calculation means for calculating a link density 
of each of said logical segments according to 
the formula: 



where D is said link density, C HREF is the 
number of non-tag characters in each of said 
logical segments which appear inside HREF 
tags, C is the total number of non-tag charac- 
ters in each of said logical segments, L, is the 
number of hyperlinks within image maps in 
each of said logical segments and K represents 
a weight given to image map links. 

6. A system (10) as claimed in claim 4 or 5 wherein 
said document sectioning system (24) comprises: 

hierarchically sectioning means wherein sec- 
tions are defined hierarchically using header tags 
as boundaries with top-level sections forming a top 
of a hierarchy and said header tags denoting sub- 
sections, sub-subsections and as many further gra- 
dations as are necessary to account for the number 
of prominence values present in said structured 
document. 

7. A system (10) as claimed in one of claims 4-6, 
wherein: 

said speech mark-up information system (26) 
also determines which text shall be presented to 
said users. 

8. A system (10) as claimed in one of claims 4-7, 
wherein said speech mark-up information system 
(26) comprises: 

meta-information generation means for pro- 



ducing meta-information in a form of commands 
which will cause said text-to-speech engine to vary 
voice, tone, rate, and other parameters to ade- 
quately convey information within said structured 
5 document. 

9. A system (10) as claimed in any preceding claim, 
further comprising at least one of an E-mail support 
system: an audio files and streaming support sys- 
10 tern, a non-visual browsing system and an interlace 
for visual browsing environments: a system for pro- 
viding radio and/or commercials during document 
download. 

15 10. A method for providing interactive communication 
between a user and an information source, charac- 
terised in that it comprises the steps of obtaining a 
structured document from the information source in 
response to a request from the user using a non- 
20 keyboard based search system, and rendering the 
structured document using audio. 

11. The method as claimed in claim 1 0, further compris- 
ing the steps of providing at least one of: 

25 

an E-mail support system: 

an audio files and streaming support system: 

and radio and/or commercials during document 

download. 

30 

1 2. The method as claimed in claim 1 0 or 1 1 , 
wherein rendering structured documents using au- 
dio comprises the steps of: 

35 performing a pre-rendering process which con- 

verts a structured document into an intermedi- 
ate document: and 

performing an audio rendering process which 
generates an audio output. 

40 

13. The method as claimed in claim 12, wherein per- 
forming a pre-rendering process comprises the 
steps of: 

45 dividing said structured document into logical 

segments: 

categorising said logical segments as either 
navigation segments or content segments: 
determining section structure of said structured 
50 document: and 

creating said intermediate document which can 
be interpreted by a text-to-speech engine. 

14. The method as claimed in claim 13, wherein the 
55 step of categorising said logical segments compris- 
es calculating a link density of each of said logical 
segments according to the formula: 
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where D is said link density C HREF is the number of 5 
non-tag characters in each of said logical segments 
which appear inside of HREF tags, C is the total 
number of non-tag characters in each of said logical 
segments : L, is the number of hyperlinks within im- 
age maps in each of said logical segments and K 10 
represents a weight given to image map links. 

15. The method as claimed in claim 13 or 14.. wherein 
the step of determining section structure comprises 
defining sections hierarchically using header tags *s 
as boundaries with top-level sections forming the 
top of a hierarchy and said header tags denoting 
subsections, sub-subsections and as many further 
gradations as are necessary to account for the 
number of prominence values present in the struc- 20 
tured document. 

16. The method as claimed in one of claims 13-15, 
wherein the step of creating said intermediate doc- 
ument comprises producing meta-information in a 25 
form of commands which will cause a text-to- 
speech conversion means to vary voice : tone, rate : 
and other parameters to adequately convey infor- 
mation within the structured document. 



1 7. The method as claimed in one of claims 1 0-1 6, also 
comprising providing an interface for information 
exchange to users comprising the steps of provid- 
ing a non-visual browsing system: and providing an 
interface to users for visual browsing environments. 35 

18. A method of providing interactive access to a net- 
work, comprising the steps of: 



providing audio feedback of a browser to a user JO 
and by said user pressing a follow button, pro- 
viding most recently played hyperlink: 
controlling volume of audio signal by an audio 
volume dial: 

providing a plurality of modes of browsing 45 
wherein each mode allows said user a specific 
kind of rendering of a page based on informa- 
tion in an intermediate document and wherein 
all modes begin a document by speaking title 
of said document: 50 
allowing said user to scroll through a document 
using scan forward, scan-reverse and pause 
controls: 

maintaining a traditional network browser his- 
tory list; and : 55 
providing for immediate access to a number of 
user selected network documents, through use 
of a favorite control. 
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